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Abstract 

We propose a sub-structural niching method that fully exploits the problem decomposition 
capability of linkage-learning methods such as the estimation of distribution algorithms and 
concentrate on maintaining diversity at the sub-structural level. The proposed method con- 
sists of three key components: (1) Problem decomposition and sub-structure identification, (2) 
sub-structure fitness estimation, and (3) sub-structural niche preservation. The sub-structural 
niching method is compared to restricted tournament selection (RTS) — a niching method used in 
hierarchical Bayesian optimization algorithm — with special emphasis on sustained preservation 
of multiple global solutions of a class of boundedly-difficult, additively-separable multimodal 
problems. The results show that sub-structural niching successfully maintains multiple global 
optima over large number of generations and does so with significantly less population than 
RTS. Additionally, the market share of each of the niche is much closer to the expected level in 
sub-structural niching when compared to RTS. 



1 Introduction 

One of the daunting challenges in the field of genetic and evolutionary computation is the sys- 
tematic and principled design of scalable genetic algorithms (GAs) and significant progress has 
been made along these lines. A design decomposition theory has been proposed and several com- 
petent GAs — GAs that solve hard problems quickly, reliably, and accurately — have been devel- 
oped flGoldberg, 1999| ). One such class of competent GAs is the estimation of distribution algo- 
rithms (ED As) ( |Pelikan, Lobo, Goldberg, 2002} Earranag a Lozano, 2002 1 . EDAs replace the 



traditional variation operators of GAs with probabilistic model building of promising solutions 
that identifies key sub-structures (or building blocks) of the underlying search problem, and sam- 
pling the model to generate new candidate solutions. EDAs have successfully solved problems 
of bounded difficulty at a single level or at multiple hierarchical levels requiring only polynomial 
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(oftentimes sub-quadratic) number of function evaluations (Pelikan, Gol dberg, fc Oantu-Paz, 2 000 
IPelikan Goldberg, 200Tj |Pelikan, Sastry, Goldberg, 2003| ). 

One of the important components required by EDAs for successfully solving multimodal, hierar- 
chical, dynamic, and multiobjective optimization problems is an efficient niching method. The nich- 
ing mechanism is required to stably maintain a diverse population throughout the search, thereby 
allowing EDAs to (1) identify multiple optima reliably when solving multimodal and multiobjective 
problems, (2) identify the global optimum by deciding successfully between sub-structures when 
all the hierarchical interactions are revealed, and (3) rapidly identify global solutions as and when 
changes occur in non-stationary problems. Such a niching method not only needs to adaptively 
identify and conform to all the niches and niche-distance distributions, but also need to maintain 
them effectively over the duration of the search. 

Traditional niching methods usually maintain diversity at the level of individuals and are not 
often adaptive to the niche size and distribution. Additionally, they also do not exploit the un- 
derlying working mechanism of EDAs and other linkage- learning algorithms. That is, the tra- 
ditional nichers often use distance information based on the entire individual and do not often 
directly respect or exploit problem decomposition. Therefore, in this paper we propose a niching 
method that respects problem decomposition, and utilizes the sub-structure identification capabil- 
ity of EDAs, and maintains diversity at sub-structure level in a stable manner. Such a niching 
method is not only advantageous for maintaining multiple niches, but also effective for hierarchi- 
cal dPelikan Hz Goldberg, 200T| ), and dynamic ( [Branke, 2001} |Abbass, Sastry, Kz Goldberg, 2004 1 
problem optimizations where sub-structure niche preservation is what actually required. 

The proposed method consists of three components: (1) Sub-structure identification, where we 
use the probabilistic model built by EDAs, specifically, extended compact GA ( |Harik, 19 99), (2) 
sub-structure fitness estimation, where we use the fitness-estimation procedure proposed by Sastry 
et al (Sastry, Pelikan, & Goldberg, 2004), and (3) sub-structure niche preservation, where different 
mechanisms can be envisioned, and suitability of each is based on the purpose and objective of 
niching. The key idea of the sub-structure niche preservation mechanism is to preserve highly-fit 
sub-structures in desired proportions in the population in a stable manner over the duration of the 
search. 

The sub-structural niching mechanism is compared with restricted tournament selection 
(Harik, 1995) — a nicher used in hierarchical Bayesian optimization algorithm (hBOA) — on a class 
of boundedly-difficult additively-separable multimodal problems. Specifically, we compare (1) the 
stability of maintaining multiple niches over a large number of generations, (2) the capability of 
allocating market share to different niches at the desired level, and (3) the population size required 
to consistently maintain all the global optima. 

The paper is organized as follows. The next section provides a brief introduction to the extended 
compact genetic algorithm, followed by a detailed description of the proposed sub-structural niching 
mechanism. The performance of the proposed method is compared to that of RTS in section 
followed by key conclusions of the paper. 



2 Extended Compact Genetic Algorithm 

Extended compact genetic algorithm (eCGA) (Harik, 1999) is an EDA that replaces traditional 
variation operators of genetic and evolutionary algorithms by building a probabilistic model of 
promising solutions and sampling the model to generate new candidate solutions. The typical steps 
of eCGA can be outlined as follows: 
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1. Initialization: The population is usually initialized with random individuals. However, other 
initialization procedures can also be used in a straightforward manner. 

2. Evaluation: The fitness or the quality-measure of the individuals are computed. 

3. Selection: Like traditional genetic algorithms, EDAs are selectionist schemes, because only a 
subset of better individuals is permitted to influence the subsequent generation of candidate 
solutions. Different selection schemes used elsewhere in genetic and evolutionary algorithms — 
tournament selection, truncation selection, proportionate selection, etc. — may be adopted for 
this purpose, but a key idea is that a "survival-of-the-fittest" mechanism is used to bias the 
generation of new individuals. 

4. Probabilistic model estimation: Unlike traditional GAs, however, EDAs assume a particular 
probabilistic model of the data, or a class of allowable models. A class-selection metric and a 
class-search mechanism is used to search for an optimum probabilistic model that represents 
the selected individuals. 

Model representation: The probability distribution used in eCGA is a class of probability 
models known as marginal product models (MPMs). MPMs partition genes into mutually 
independent groups and specifies marginal probabilities for each linkage group. 

Class-Selection metric: To distinguish between better model instances from worse ones, 
eCGA uses a minimum description length (MDL) metric dRissanen, 1978 ). The key concept 



behind MDL models is that all things being equal, simpler models are better than more 
complex ones. The MDL metric used in eCGA is a sum of two components: 

• Model complexity which quantifies the model representation size in terms of number 
of bits required to store all the marginal probabilities: 

m 

C m = log 2 (n)^(2^-l). (1) 

i=i 

where n is the population size, m is the number of linkage groups, ki is the size of the 
jth g roU p 

• Compressed population complexity, which quantifies the data compression in terms 
of the entropy of the marginal distribution over all partitions. 

C p = nJ2^2 ~ Pi i log2 ' ( 2 ) 

i=l j=l 

where pij is the frequency of the j th gene sequence of the genes belonging to the z th 
partition. 

Class-Search method: In eCGA, both the structure and the parameters of the model are 
searched and optimized to best fit the data. While the probabilities are learnt based on the 
variable instantiations in the population of selected individuals, a greedy-search heuristic is 
used to find an optimal or near-optimal probabilistic model. The search method starts by 
treating each decision variable as independent. The probabilistic model in this case is a vector 
of probabilities, representing the proportion of individuals among the selected individuals 
having a value '1' (or alternatively '0') for each variable. The model-search method continues 
by merging two partitions that yields greatest improvement in the model-metric score. The 
subset merges are continued until no more improvement in the metric value is possible. 
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(a) Trap: m = 10, k = 4 



(b) Trap: m = 5, k = 5 



(c) Bipolar: m = 5, k = 6 



Figure 1: Comparison of the ideal and experimental sub-structure frequencies for different additively 
separable problems. 

5. Offspring creation: In eCGA, new individuals are created by sampling the probabilistic model. 
The offspring population are generated by randomly generating subsets from the current 
individuals according to the probabilities of the subsets as calculated in the probabilistic 
model. 

6. Replacement: Many replacement schemes generally used in genetic and evolutionary 
computation — generational replacement, elitist replacement, niching, etc. — can be used in 
EDAs, but the key idea is to replace some or all the parents with some or all the offspring. 

7. Repeat steps 2-6 until one or more termination criteria are met. 



3 Sub- Structural Niching 



Traditional niching methods flCavicchio, 1970; |De Jong, 1975 


|Goldberg & Richardson, 1987 


Mahfoud, 1992| |Yin <fc Germay, 19931 |Harik, 19951 


Mahfoud, 1995| |Horn, 1997 



dividuals. Effectiveness of such methods are strongly dependent on the niche distribution. While 
some methods exist that can automatically adjust the niche radius (Goldberg & Wang, 1998), 
they still detect diversity on the individual level. 



One of the key elements of Goldberg's design decomposition ( Goldberg, 199^ Goldberg, 2002 1 



which has been influential in the design and development of many competent GAs — suggests that 
one of the critical steps for GA success is problem decomposition, and identification and mixing of 
building blocks. Since the EDAs work by first decomposing the search problems into sub-structures 
and then creating new solutions by exchanging different sub-structures, it might be advantageous, 
sometimes even necessary, to maintain diversity at the building-block (sub-structural) level and 
not at individual level ( Sastry, Abbass, Goldberg, 2004] ). This is especially the case for dynamic 



optimization, hierarchical-problem optimization, and multiobjective optimization. 

Sub-structural niching requires three key elements: 
Sub-structure identification: To maintain diversity at the sub-structural level, we first need 
a mechanism to automatically identify all the important building blocks of the underlying search 
problem. In this study, we use the probabilistic models built by the eCGA. However, other linkage 
identification techniques ( Goldberg, Korb, Deb, 1989[ Goldberg, Deb, Kargupta, Harik, 1 993 
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Kargupta, 19961 |Munetomo fc Goldberg,lMJl |Yu, Goldberg, Yassine, fc Chen, 20031 |Harik, 1 997 ; 



Pelikan, Lobo, Hi Goldberg, 20021 |Larranaga Hi Lozano, 2002D can be used in a straightforward 



manner. 

Sub-structure fitness estimation: Once key sub-structures are identified, we must decide 
on which sub-structures to preserve and in what proportion. While sometimes, we might require to 
retain all sub-structure alternatives, usually we only need to preserve the highly fit ones. In order 
to do so, we need a way to estimate the quality of substructures from the fitnesses of individuals 
that possess them. 

In this study, we use the fitness-estimation method proposed by Sastry, Pelikan and Goldberg 
(Sastry, Pelikan, & Goldberg, 2004). That is, after the probabilistic model is built and the linkage 



map is obtained, we estimate the fitness of sub-structures. In all, we estimate the fitness of a 
total of Y2iLi 2 ki schemas. For example, for a four-bit problem, whose model is [1,3] [2] [4] , the 
schemas whose fitnesses are estimated are: {0*0*, 0*1*, 1*0*, 1*1*, *0**, *1**, ***0, ***l}. 

The fitness of a sub-structure, h, is defined as the difference between the average fitness of 
individuals that contain the schema and the average fitness of all the individuals. That is, 

1 1 n ' 

/*W = - E /to-^E/O*) (3) 

{i\xiDh} i=l 

where is the total number of individuals that contain the schema h, Xj is the i th individual and 
f(xi) is its fitness, n' is the total number of individuals that were evaluated. If a particular schema 
is not present in the population, its fitness is arbitrarily set to zero. Furthermore, it should be noted 
that the above definition of schema fitness is not unique and other estimates can be used. The key 
point however is the use of the probabilistic model in determining the schema fitnesses. Further 
details regarding the estimation method are given elsewhere ( Sastry, Pelikan, &i Goldberg, 2 004 
Pelikan g Sastry, 2004). 



Sub-structure niche preservation: Having identified the sub-structures and estimated their 
quality is not enough, we still need to decide on a methodology for preserving the substructures. 
Different methods such as fitness-proportionate, ranking, and truncation can be used and no one 
method is better than the other. For example, we can opt to preserve the sub-structures in propor- 
tion to their estimated fitness (so called fitness-proportionate method). That is, we have to modify 
the sampling frequencies of the sub-structures: 

2Zi=ifs(hi) 

We then use the above frequencies to sample the substructures to create new offspring. 

Regardless of how the sub-structure preservation is done, the key idea is to preserve those sub- 
structures that are potentially highly fit and are a part of different global optima. The different 
sub-structure preservation methods usually requires modification of the sampling frequencies of 
sub-structures used in EDAs to generate new candidate solutions. 

Before we use the proposed method for sustained maintenance of multiple global optima, we need 
to verify whether the sub-structure fitness estimate is accurate and if the method is capable of pre- 
serving different substructures over time. For the verification, we use fitness-proportionate method; 
that is, we maintain different substructures in proportion to their estimated fitness. We first in- 
vestigate the accuracy of relative fitness estimates of different sub-structures in a given partition. 
The results for two different additively decomposable problems, m-k deceptive trap (Ackley, 1987 
Goldberg, 1987] |Deb Hi Goldberg, 1992D and m-k bipolar function ( [Goldberg, Deb, Hi Horn, 1992D 
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Figure 2: Illustration of sub-structure preservation via fitness-proportionate method for a 10-4 
deceptive trap function 

are shown in figure ^ To demonstrate that the niching method can maintain all substructures in 
a sustained manner over time, we plot the market share of each of the 16 schemata for the 10-4 
deceptive trap functions in figure |2j The results show that the substructure-fitness estimation is 
quite accurate and that it can preserve the substructures at their desired proportions over time. 



4 Results and Discussion 

In this section, we investigate the effectiveness of the sub-structural niching in stably maintaining 
all the global optima over a large number of generations, and the population size required to do so, 
as a function of the number of optima. We note that in all the results presented in this paper, we 
consider fitness-proportionate sub-structural niche preservation mechanism for proof-in-principle 
and, as mentioned earlier, other possible mechanisms can be more beneficial depending on the goal 
for using the niching mechanism. Additionally, the window size for RTS was set to the problem 
size, as suggested elsewhere (Pelikan &: Goldberg, 2001). Before presenting the results we first give 
a brief description of the test problem considered in the experiments. 

Our approach in verifying the performance of sub-structural niching is to consider bounding 



adversarial problems that exploit one or more dimensions of problem difficulty (Goldberg, 2002). 
Particularly, we are interested in problems where building-block identification is critical for the GA 
success. Additionally, the problem solver (eCGA) should not have any knowledge of the building- 
block structure of the test problem, but should be known to researchers for verification purposes. 

One such class of problems is the m-k deceptive trap problem, which consists of additively 
separable deceptive functions ( Ackley, 1987[ Goldberg, 1987} Deb &: Goldberg, 1992 ). Deceptive 
functions are designed to thwart the very mechanism of selectorecombinative search by punishing 
any localized hillclimbing and requiring mixing of whole building blocks at or above the order 
of deception. Using such adversarially designed functions is a stiff test — in some sense the stiffest 
test — of algorithm performance. The idea is that if an algorithm can beat an adversarially designed 
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test function, it can solve other problems that are equally hard or easier than the adversarial 
function. 

In this study, we use a modified m-4 deceptive trap problems where both 0000 and 1111 have 
equal fitness. Therefore there are 2 m global optima with an identical fitnesses. That is, each fc-bit 
trap is defined as follows: 



trap k (u) 



1 
1 

0.75 



u 

fc-1 



if u = k 
if u = 

otherwise 



(5) 



where u is the number of Is in the input string of k bits. 

First, we compare the ability of RTS and sub-structural niching in maintaining all the global 
optima over time in a stable manner (see figure OJ) • We start by comparing the single GA run 
behavior of both niching methods in figures 3(a)| and |3(b)[ where we show the proportion of indi- 
viduals in each of the 32 global optima of a 5-4 trap function as a function of time. The results 
clearly show that in contrast to RTS the niche maintenance of sub-structural niching is highly sta- 
ble and the allocated market share for each optima is in agreement with the desired proportion of 
1/32 = 0.03125 1 . We then consider the average behavior of both RTS and sub-structural niching, 
where we plot the average market share of each of the global optima over time in figures |3(c)| and 
|3(d)| The lines above the bars in figures 3(c) and |3(d)| depict the standard deviation and opti- 
mal solution ID refers to an arbitrary (but unique) number for each of the 32 global optima. For 
simplicity, we also plot the average, average minimum, and average maximum market share of an 



optima in figures 3(e) and |3(f)j Figures 3(c) 3(f) clearly show that RTS cannot stably maintain 
the global optima, even at large population sizes, when compared to sub-structural niching. 

Figure |21 clearly indicates overall the effectiveness of sub-structural niching in stably maintaining 
all the global optima at the desired proportion over large number of generations. We note that 
the time to detect the global optima is faster in RTS than in sub-structural niching. This is to be 
expected as sub-structural niching maintains diversity in all sub-structures proportional to their 
fitness, and it takes longer for mixing to hone in on to the global optima. However, once the optima 
are found, sub-structural niching preserves much more stably than RTS. 

We also studied the effect of population size, n, on the success probability of maintaining the 
global optima, 7, the results of which are shown for 5-4 deceptive trap function in figure EJ The 
figure plots the probability of maintaining all the global optima for different number of generations 
as a function of population size for both RTS and sub-structural niching. As shown in the figure, 
RTS, requires larger population sizes to maintain the global optima for longer time. This is well 
understood phenomena of traditional nichers and has been analyzed by Mahfoud for fitness sharing 
(Mahfoud, 1994). However, in sub-structural niching, the population size required to achieve a 
certain success probability, 7, is independent of the number of generations we would like to maintain 
the niches. Additionally, RTS requires significantly larger population size than sub-structural 
niching to achieve the same level of success probability. 

Finally, we use the n versus 7 results, to determine the population size required to maintain 
successfully all the global optima with high probability. Specifically, we plot the minimum popula- 
tion size required by sub-structural niching and RTS for maintaining at least one copy of n op t — 1 or 
more global optima in the population for different number of generations as a function of number 
of optima, n op t, is figure 03 The lines plotted for the RTS results are from the population-sizing 

1 Since all the global optima have identical fitness, we expect that the market share of each optima is 1/32 = 
0.03125. 
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Figure 3: Comparison of the performance of RT^and sub-structural niching in stably maintaining 
all global optima for a concatenated 5-4 deceptive trap problem: (a) Sz (b) Single GA run behavior, 
(c) & (d) Average behavior, and (e) & (f) Average, average minimum and average maximum 
proportion allocated to an optima. The maintenance of all the optima by RTS is very noisy 
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Figure 4: The probability of maintaining at least one copy of all the global optima, 7, over different 
number of generations as a function of population size for RTS and sub-structural niching. RTS 
requires significantly larger population size to maintain all the global optima than the sub-structural 
niching. The results are averaged over 50 independent GA runs 



model of Mahfoud (Mahfoud, 1994): 



noc log[(l-7^)/M 
log [(n opt - 1) /riopt] 

where t is the number of generations we need to maintain all the niches. Figure |S] clearly shows 
that sub-structural niching requires significantly less population size than RTS to maintain all the 
global solutions with a high probability. We note that the population size for sub-structural niching 
eventually will grow linearly (in accordance with the population-sizing model) with n opt as we need 
at least one individual for each of the optima. However, this might not be the case for hierarchical 
and dynamic optimization problems, where diversity is required only at the sub-structural level and 
not at the solution level. Nevertheless, sub-structural niching requires orders of magnitude smaller 
populations than RTS and stably maintains niches with a high probability even for a problem with 
about a thousand global optima. 



5 Summary and Conclusions 

In this paper we proposed a sub-structural niching mechanism, which, in contrast to traditional 
niching mechanisms, exploits the problem decomposition capability of estimation of distribution 
algorithms and stably maintains diversity at the sub-structure, or building-block level rather than 
at individual level. The sub-structural niching mechanism consists of three components: (1) Sub- 
structure identification, where we use the probabilistic model built by EDAs, specifically, extended 
compact GA ( Harik, 1999| ), (2) sub-structure fitness estimation, where we use the fitness-estimation 
procedure proposed by Sastry et al ( Sastry, Pelikan, Goldberg, 2004 ), and (3) sub-structure niche 
preservation, where different mechanisms are can be envisioned, each suitable based on the pur- 
pose and objective of using the niching method. Regardless of how it is done, the key idea of 
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optima for RTS and sub-structural niching. The results for RTS conform to the population-sizing 
model of Mahfoud [23] (equation . The results are averaged over 50 independent GA runs. 

the sub-structure niche preservation mechanism is to preserve highly-fit sub-structures in desired 
proportions in the population in a stable manner over the duration of the search. 

We also tested performance of the sub-structural niching mechanism on a class of boundedly- 
difficult additively separable multimodal problems and compared it with those of restricted tour- 
nament selection (RTS) — a niching method used in hierarchical Bayesian optimization algorithm. 
The results show that not only is the sub-structural niching mechanism able to stably preserve mul- 
tiple global optima over large number of generations, but does so with a high probability requiring 
significantly less population size than RTS. The results indicate that sub-structural niching can be 
particularly effective with hierarchical and dynamic problem optimization. 
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